Modeling the Complexity and Descriptive Adequacy of Construction Grammars
نویسنده
چکیده
This paper uses the Minimum Description Length paradigm to model the complexity of CxGs (operationalized as the encoding size of a grammar) alongside their descriptive adequacy (operationalized as the encoding size of a corpus given a grammar). These two quantities are combined to measure the quality of potential CxGs against unannotated corpora, supporting discovery-device CxGs for English, Spanish, French, German, and Italian. The results show (i) that these grammars provide significant generalizations as measured using compression and (ii) that more complex CxGs with access to multiple levels of representation provide greater generalizations than single-representation CxGs. 1 Complexity and Descriptive Adequacy Construction Grammars (CxGs; Goldberg, 2006; Langacker, 2008) operate at multiple levels of representation (lexical, syntactic, and semantic) making them potentially much more complex than purely syntactic grammars. This paper models both (i) the computational complexity of CxGs and (ii) their descriptive adequacy against unannotated corpora using Minimum Description Length (MDL). These two measures, complexity and descriptive adequacy, can be used together as an objective function for measuring the quality of CxGs: the optimum grammar balances higher descriptive adequacy against lower complexity. Once we can measure the quality of a particular grammar in reference to a corpus of observed language use, we can search until we find the optimum grammar for that corpus. This paper uses measures of complexity and descriptive adequacy to learn CxGs for English, Spanish, French, German, and Italian. The goal is not to examine the representational capacity of CxGs in general because CxG is a fundamentally usage-based paradigm (Hopper, 1987; Kay & Fillmore, 1999; Bybee, 2006). This means that the general capacity of its grammars must be weighted by their actual content: how can we model the complexity of a specific CxG used to describe a specific language, where that language is represented by a specific observable corpus? Previous computational work on CxG (Steels, 2004; Bryant, 2004; Chang, et al., 2012; Steels, 2012) has relied on introspection-based representations that require a linguist to determine the optimum constructions by intuition. From a linguistic perspective, these representations are neither replicable nor falsifiable and are unable to test hypotheses about the mechanisms of emergence that map from observed usage to learned generalizations. From a computational perspective, these representations are not scalable across domains and languages and are subject to all the constraints of knowledge-based systems. Other data-driven approaches (Wible & Tsao, 2010; Forsberg, et al., 2014) generate potential constructions but do not evaluate the quality of competing CxGs as collections of constructions. Section 2 discusses how CxGs are represented and Section 3 considers interactions between different levels of representation. Section 4 presents Minimum Description Length as a joint measure of complexity and descriptive adequacy suitable for measuring grammar quality while Section 5 operationalizes CxG encoding. Section 6 describes the search algorithm for optimizing grammar quality. Section 7 describes a multi-lingual experiment in measuring CxG complexity and descriptive adequacy and Appendix A discusses constructions learned from the corpus of English.
منابع مشابه
Conciseness of Associative Language Descriptions
Associative Language Descriptions are a recent grammar model, theoretically less powerful than Context Free grammars, but adequate for describing the syntax of programming languages. ALD do not use nonterminal symbols, but rely on permissible contexts for specifying valid syntax trees. In order to assess ALD adequacy, we analyze the descriptional complexity of structurally equivalent CF and ALD...
متن کاملAssociative Definition of Programming Languages1
Associative Language Descriptions are a recent grammar model, theoretically less powerful than Context Free grammars, but adequate for describing the syntax of programming languages. ALD do not use nonterminal symbols, but rely on permissible contexts for specifying valid syntax trees. In order to assess ALD adequacy, we analyze the descriptional complexity of structurally equivalent CF and ALD...
متن کاملThe effect of language complexity and group size on knowledge construction: Implications for online learning
This study investigated the effect of language complexity and group size on knowledge construction in two online debates. Knowledge construction was assessed using Gunawardena et al.’s Interaction Analysis Model (1997). Language complexity was determined by dividing the number of unique words by total words. It refers to the lexical variation. The results showed that...
متن کاملDescriptional Complexity of Generalized Forbidding Grammars
This paper discusses the descriptional complexity of generalized forbidding grammars in context of degrees, numbers of nonterminals and conditional productions, and a new descriptional complexity measure—an index—of generalized forbidding grammars.
متن کامل